3 Methodology

In this section, we describe the methodology we follow for this analysis. We first describe how we created a dataset on which to estimate park activity location choices for a sample of observed trips in Alameda County, California. Then we provide an overview of destination choice modeling and using such models to derive utility-based accessibility.

3.1 Study area

Alameda County is one of nine counties that constitute the San Francisco Bay Area metropolitan region in California. Alameda is the seventh most populous county in California with a population of 1.5 million (U.S. Census Bureau 2019), and has 14 incorporated cities and several unincorporated communities. It is an economically and ethnically diverse county and hence it was an attractive area to use for this study. The racial makeup of Alameda County was 49.7% White, 11.2% African American, 1.0% Native American, 38.7% Asian, 1.0% Pacific Islander, and 22.4% Hispanic or Latino of any race. Alameda County has a diverse set of parks, ranging from local small community parks, urban and transit-accessible parks like the Lake Merritt Recreational area, coastal access points, and suburban recreational areas like Lake Chabot.

3.2 Data

We constructed an estimation data set from a publicly-available parks polygons layer, a commercially acquired passive device origin-destination table representing visitors to parks and inferred residential block groups for these visitors, and American Community Survey data for the residence block groups. We also constructed a dataset representing open street installations that were implemented in response to the COVID-19 pandemic.

3.2.1 Model estimation data

We obtained a polygons shapefile layer representing open spaces in Alameda County, California from the California Protected Areas Database (GreenInfo Network 2019). This dataset was selected because it included multiple different types of open space including local and state parks, traditional green spaces as well as wildlife refuges and other facilities that can be used for recreation. We removed facilities that did not allow open access to the public (such as the Oakland Zoo) and facilities whose boundaries conflated with freeway right-of-way – this prevents trips through the park from being conflated with park trips in the passive origin-destination data.

not_parks <- c("3455", "15886", "13243")
parks <- st_read(here("data/bayarea_parks.geojson"), quiet = TRUE) %>%
  filter(COUNTY == "Alameda") %>%
  transmute(
    id = as.character(UNIT_ID), name = UNIT_NAME, 
    access = factor(ACCESS_TYP, levels = c("Open Access", "No Public Access", 
                                           "Restricted Access")),
    acres = ACRES, type = DES_TP) %>%
  filter(!id %in% not_parks) %>%
  filter(access == "Open Access") %>%
  select(id, access, acres, type)  %>%
  st_transform(this_crs)

From this initial parks polygons dataset, we obtained park attribute information through OpenStreetMap (OSM) using the osmdata package for R (Padgham et al. 2017). Three attribute elements are considered in this analysis. First, we identify playgrounds using OSM facilities given a leisure = playground tag. The tagged facilities can be either polygon or point objects; in the former case we use the polygon centroid to determine the point location of the playground.

Second, we consider sport fields of various kinds identified with the OSM leisure = pitch tag. This tag has an additional attribute describing the sport the field is designed for, which we retain in a consolidated manner. Soccer and American football fields are considered in a single category, and baseball and softball fields are combined as well. Basketball, tennis, and volleyball courts are kept as distinct categories, with all other sport-specific fields combined into a single “other.” Golf courses are discarded. As with playgrounds, polygon field and court objects are converted into points at the polygon centroid.

pitches_osm <- opq(bb) %>% # specify boundary for query
  add_osm_feature(key = "leisure", value = "pitch") %>% 
  osmdata_sf() %>% 
  trim_osmdata(bb, exclude = TRUE)

pitches <- rbind(
  # convert polygons to centroids
  pitches_osm$osm_polygons %>%
    st_transform(this_crs) %>%
    st_centroid() %>%
    select(osm_id, sport),
  # get points
  pitches_osm$osm_points %>%
    st_transform(this_crs) %>%
    select(osm_id, sport)
  # no multipolygons or polylines that need to be included
) %>%
  filter(sport != "golf") %>%
  mutate(
    sport = case_when(
      grepl("football", sport) | grepl("soccer", sport) ~ "football / soccer",
      grepl("baseball", sport) | grepl("softball", sport) ~ "baseball",
      grepl("basketball", sport) ~ "basketball",
      grepl("tennis", sport) ~ "tennis",
      grepl("volleyball", sport) ~ "volleyball",
      TRUE ~ "other_pitch"
    )
  )

Finally, we identified trails and footpaths using the path, cycleway, and footway values of the highway tag. A visual inspection of the resulting data revealed that the large preponderance of sidewalks and cycling trails within parks in Alameda County are identified properly with these variables. Trails are represented in OSM as polylines, or as polygons if they form a complete circle. In the latter case, we converted the polygon boundary into an explicit polyline object.

It is possible for each of these facilities to exist outside the context of a public park. For example, many private apartment complexes have playgrounds and high schools will have sports facilities that are not necessarily open to the general public. We spatially matched the OSM amenities data and retained only those facilities that intersected with the CPAD open spaces database identified earlier.

# Append attributes to park frame ===============
parks_with_playgrounds <-  parks %>%  
  # compute yeo-johnson transformation on acres
  # determine if a playground point is inside the park polygon boundary
  st_join(playgrounds %>% select(playground), st_intersects)  %>%
  # a park with multiple playgrounds will join multiple times, so we need to 
  # summarize them back down.
  group_by(id) %>% slice(1) %>% ungroup() %>%
  mutate(playground = ifelse(is.na(playground), F, playground)) %>%
  select(id, playground) %>%
  st_set_geometry(NULL)
  
# determine if the pitch points are inside the park polygon boundaries
parks_with_pitches <- parks %>%
  select(id) %>%
  st_join(pitches %>% select(sport)) %>%
  st_set_geometry(NULL) %>%
  # multiple pitches in a park result in repeated rows. 
  group_by(id, sport) %>% summarise(count = n()) %>%
  pivot_wider(id_cols = id, values_from = count, names_from = sport, 
              values_fill = FALSE) %>%
  mutate(
    pitch = ifelse(`NA` == 1, FALSE, TRUE),
    across(baseball:tennis, ~ifelse(. > 0, TRUE, FALSE)),
  ) %>%
    select(-`NA`)

# determine which walking paths are inside park polgon boundaries
parks_with_trails <- parks %>% 
  # determine if the trails are inside the park polygon boundaries
  st_join(trails %>% select(trail = osm_id), st_intersects) %>%
  # multiple pitches in a park result in repeated rows. 
  group_by(id) %>% slice(1) %>% ungroup() %>%
  mutate(trail = ifelse(is.na(trail), F, T)) %>%
  select(id, trail) %>%
  st_set_geometry(NULL)

# join together and write out =================
attributed_parks <- parks %>%
  # compute yeo-johnson transformation on acres
  mutate(yj_acres = VGAM::yeo.johnson(acres, lambda = 0)) %>%
  left_join(parks_with_playgrounds, by = "id") %>%
  left_join(parks_with_pitches,     by = "id") %>%
  left_join(parks_with_trails,      by = "id") 
dj1 <- wesanderson::wes_palette("Darjeeling1")
if(knitr::is_latex_output()) {
  pbox <- st_bbox(st_transform(parks, 4326))
  alameda_back <- get_map(c(lon = mean(pbox[c(1, 3)]), 
                            lat = mean(pbox[c(2, 4)]) ), 
                          zoom = 10, source = "stamen", color = "bw")
  alameda_back <- ggmap_bbox(alameda_back)
  
  ggmap(alameda_back, extent = "device") + 
    coord_sf(crs = st_crs(3857), 
             xlim = c(-122.29674 , -122.16358), 
             ylim = c( 37.74098, 37.87028), expand = FALSE) + 
    geom_sf(data = parks %>% st_transform(3857), inherit.aes = FALSE,
            fill = dj1[2], lwd = 0)  + 
    theme(axis.line = element_line(color = NA))  + 
    xlab("") + ylab("")
} else {
  leaflet() %>%
    addProviderTiles(providers$Esri.WorldGrayCanvas) %>%
    addPolygons(data = parks %>% st_transform(4326), group = "Parks", color = dj1[2]) %>%
    addCircleMarkers(data = playgrounds %>% st_transform(4326), 
                     group = "Playgrounds", color = dj1[3], radius = 2) %>%
    addCircleMarkers(data = pitches %>% st_transform(4326), 
                     group = "Sport Fields", color = dj1[1], radius = 2) %>%
    addPolylines(data = trails %>% st_transform(4326), 
                 group = "Trails", color = dj1[5]) %>%
    addLayersControl(overlayGroups = c("Parks", "Playgrounds", "Sport Fields", "Trails"))
}

Figure 3.1: Location of parks in Alameda County.

We provided the park boundaries layer to a commercial firm, StreetLight Data Inc., which develops and resells origin-destination matrices derived from passive device location data. The provider employs a proprietary data processing engine (called Route Science) to algorithmically transform observed device location data points (the provider uses in-vehicle GPS units and mobile device LBS) over time into contextualized, normalized, and aggregated travel patterns. From these travel patterns, the Route Science processing algorithms infer likely home Census block group locations for composite groups of people and converts raw location data points into trip origin and destination points (Pan et al. 2006; Friedrich et al. 2010).

# park flows computed from Streetlight data in R/park_flows.R
park_flows <- read_rds("data/park_flows.rds")

For each park polygon, the firm returned a population-weighted estimate of how many devices were observed by home location block group over several months in the period between May 2018 and October 2018. We transformed this table such that it represented the weighted unique devices traveling between block groups and parks. We discarded home location block groups outside of Alameda County; the imputed home locations can be far away from the study area for a small amount of trips and are unlikely to represent common or repeated park activities. Table 3.1 presents descriptive statistics on the 500 parks assembled for this study, grouped according to the park type as defined on CPAD. A little more than half of the parks have identified paths, while around 40% of the identified parks have playgrounds and sport fields.

attributed_parks <- attributed_parks %>% group_by(type) %>%
  left_join(park_flows %>% group_by(park) %>% slice(1) %>% 
              select(park, attractions), by = c("id" = "park")) %>%
  mutate(attractions = ifelse(is.na(attractions), 0, attractions)) 
write_rds(attributed_parks, "data/attributed_parks.rds")
datasummary_balance(
  ~ Type,  
  data = attributed_parks %>% 
    transmute(Access = access, Acres = acres, 
           Type = ifelse(grepl("Recreation Area", type), "Recreation Area" ,type),
           Playground = playground, 
           `Any Sport Field` = pitch, 
           `Football / Soccer` = `football / soccer`,
           `Baseball` = baseball, `Basketball` = basketball,
           `Tennis` = tennis, `Volleyball` = volleyball,
           Trail = trail, 
           `Mobile Devices` =attractions), 
  dinm = FALSE, caption = "Park Summary Statistics", booktabs = TRUE)
Table 3.1: Park Summary Statistics
Local Park (N=441)
Recreation Area (N=59)
Mean Std. Dev. Mean Std. Dev.
Acres 59.8 370.5 125.6 505.1
Mobile Devices 1450.0 6685.4 2659.0 6161.0
N % N %
type Local Park 441 100.0 0 0.0
Local Recreation Area 0 0.0 57 96.6
State Recreation Area 0 0.0 2 3.4
Access Open Access 441 100.0 59 100.0
No Public Access 0 0.0 0 0.0
Restricted Access 0 0.0 0 0.0
Playground FALSE 220 49.9 44 74.6
TRUE 221 50.1 15 25.4
Any Sport Field FALSE 274 62.1 39 66.1
TRUE 167 37.9 20 33.9
Football / Soccer FALSE 414 93.9 51 86.4
TRUE 27 6.1 8 13.6
Baseball FALSE 363 82.3 45 76.3
TRUE 78 17.7 14 23.7
Basketball FALSE 340 77.1 52 88.1
TRUE 101 22.9 7 11.9
Tennis FALSE 387 87.8 53 89.8
TRUE 54 12.2 6 10.2
Volleyball FALSE 433 98.2 57 96.6
TRUE 8 1.8 2 3.4
Trail FALSE 155 35.1 22 37.3
TRUE 286 64.9 37 62.7

In order to understand the demographic makeup of the home block groups and potentially the characteristics of the people who make each trip, we obtained 2013-2017 five-year data aggregations from the American Community Survey using the tidycensus (Walker 2019) interface to the Census API for several key demographic and built environment variables: the share of individuals by race, the share of households by income level, household median income, the share of households with children under 6 years old, and the household density. An important attribute of the choice model is the distance from the home block group to the park boundary. Because we have no information on where in the block group a home is actually located, we use the population-weighted block group centroid published by the Census Bureau as the location for all homes in the block group. We then measured the network-based distance between the park and the home block group centroid using a walk network derived from OpenStreetMap via the networkx package for Python (Hagberg, Schult, and Swart 2008),

variables <- c(
  "population" = "B02001_001", # TOTAL: RACE
  "housing_units" = "B25001_001", # HOUSING UNITS
  "households" = "B19001_001", #HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS)
  # Hispanic or Latino Origin by Race
  "white" = "B03002_003",
  "black" = "B03002_004",
  "asian" = "B03002_006",
  "hispanic" = "B03002_012",
  #MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS)
  "income" = "B19013_001",
  # FAMILY TYPE BY PRESENCE AND AGE OF RELATED CHILDREN
  "children_c06" = "B11004_004", # married under 6 only
  "children_c6+" = "B11004_005", # married under 6 and older
  "children_m06" = "B11004_011", # male under 6 only
  "children_m6+" = "B11004_012", # male under 6 and older
  "children_f06" = "B11004_017", # female under 6 only
  "children_f6+" = "B11004_018", # female under 6 and older
  #HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS)
  "inc_0010" = "B19001_002",  "inc_1015" = "B19001_003", "inc_1520" = "B19001_004",
  "inc_2025" = "B19001_005", "inc_2530" = "B19001_006", "inc_3035" = "B19001_007",
  "inc_125"  = "B19001_015", "inc_150"  = "B19001_016", "inc_200"  = "B19001_017"
)

acs <- get_acs(geography = "block group", variables = variables,
               state = "CA", county = "Alameda", geometry = TRUE) %>%
  select(-moe) %>%
  spread(variable, estimate) %>%
  # area is in m^2, change to km^2
  mutate(area = as.numeric(st_area(geometry) * 1e-6)) %>%
  transmute(
    geoid = GEOID,
    group = 1,
    population, households, housing_units, 
    density = households / area,
    income,
    # many of the variables come in raw counts, but we want to consider
    # them as shares of a relevant denominator.
    children = 100 * ( children_c06 + `children_c6+` + 
                         children_m06 + `children_m6+` + 
                         children_f06 + `children_f6+`) / households,
    lowincome    = 100 * (inc_0010 + inc_1015 + inc_1520 + inc_2530 +
                            inc_3035) / households,
    highincome   = 100 * (inc_125 + inc_150 + inc_200) / households,
    black        = 100 * black / population,
    asian        = 100 * asian / population,
    hispanic     = 100 * hispanic / population,
    white        = 100 * white / population
  ) %>%
  filter(population > 0)

write_rds(acs, "data/blockgroups.rds")
# distances computed in py/shortest_paths.py
shortest_path_dir <- "data/shortest_paths"
if(!dir.exists(shortest_path_dir)) {
  # CTV: This bit doesn't work on my computer (can't extract 7z
  # files from code? So I just extracted them from the 7zip utility
  # within Windows Explorer prior to running)
  system2("7z", args = c("x", "data/shortest_paths.7z", "-odata/"))
}
path_files <- dir(shortest_path_dir, full.names = TRUE)
distance_df <- lapply(path_files, function(file) {
  read_csv(file, col_types = list(geoid = col_character(), park_id = col_character())) %>%
  mutate(
    yj_distance = VGAM::yeo.johnson(distance, lambda = 0),
    yj_euc_dist = VGAM::yeo.johnson(euc_dist, lambda = 0)
  )
}) %>%
  bind_rows() %>%
  rename(home = geoid, park = park_id) %>%
  mutate(home = str_pad(home, width = 12, side = "left", pad = "0"))
write_rds(distance_df, "data/distances.rds")
acs_for_table <- acs %>% select(
    "Density: Households per square kilometer" = density,
    "Income: Median tract income" = income,
    "Low Income: Share of households making less than $35k" = lowincome,
    "High Income: Share of households making more than $125k" = highincome,
    "Children: Share of households with children under 6" = children,
    "Black: Share of population who is Black" = black,
    "Asian: Share of population who is Asian" = asian,
    "Hispanic: Share of population who is Hispanic*" = hispanic,
    "White: Share of population who is White" = white)

datasummary_skim(acs_for_table, title = "Block Group Summary Statistics",
  histogram = !knitr::is_latex_output()) %>%
  kableExtra::kable_styling(latex_options = c("scale_down")) %>%
  kableExtra::column_spec(1, width = "4cm") %>%
  kableExtra::footnote(symbol = "Hispanic indicates Hispanic individuals of all races; non-Hispanic individuals report a single race alone.") 
Table 3.2: Block Group Summary Statistics
Unique (#) Missing (%) Mean SD Min Median Max
Density: Households per square kilometer 1041 0 1720.6 1582.2 0.0 1333.2 20890.5
Income: Median tract income 981 3 99928.8 46889.3 10961.0 92610.5 250001.0
Low Income: Share of households making less than $35k 993 0 17.1 13.4 0.0 13.7 89.2
High Income: Share of households making more than $125k 1011 0 35.8 20.7 0.0 33.9 100.0
Children: Share of households with children under 6 984 0 14.8 8.9 0.0 13.6 52.3
Black: Share of population who is Black 918 0 11.6 13.7 0.0 6.3 77.4
Asian: Share of population who is Asian 1019 0 26.2 20.4 0.0 20.2 89.5
Hispanic: Share of population who is Hispanic* 1026 0 22.2 18.8 0.0 15.7 85.9
White: Share of population who is White 1036 0 34.0 22.8 0.0 29.6 100.0
* Hispanic indicates Hispanic individuals of all races; non-Hispanic individuals report a single race alone.

3.2.2 Model application data

slowstreets <- st_read("data/slow_streets.geojson", quiet = TRUE)  %>%
  st_transform(this_crs) 

street_parks <- slowstreets %>%
  st_buffer(dist = 25)  %>% # 50 foot wide park ( 4 lanes of traffic)
  mutate(acres = as.numeric(st_area(.)) / 43560) %>% 
  mutate() %>%
  transmute(
    id = str_c("st", ID), access = "Open Access", acres, type = "Street",
    yj_acres = VGAM::yeo.johnson(acres, lambda = 0),
    playground = FALSE, baseball = FALSE, basketball = FALSE, 
    `football / soccer` = FALSE, other_pitch = FALSE, volleyball = FALSE,
    tennis = FALSE, pitch = FALSE, trail = TRUE, attractions = NA
  )
  
write_rds(street_parks, "data/street_parks.rds")

We compiled a list of streets in Berkeley, Oakland, and Alameda that were converted to public open space from each city’s respective websites (City of Alameda 2020; City of Oakland 2020; City of Berkeley 2020) and referred to the “Shifting Streets” COVID-19 mobility dataset (T. Combs et al. 2020) to determine whether other cities and places within Alameda County have similar Open Streets projects (as best as we could determine, they did not). Based on the information we gathered from these sources, 74 individual streets were converted; these streets represent 27.6 total miles across the cities of Berkeley, Oakland, and Alameda. For the purposes of this analysis, we represent each opened street as a single “park” without any sport facilities or playgrounds, but with a trail / walking path. The database provides the opened streets as polyline objects; we assert a 25-foot buffer around the line to represent a polygon with a measurable area. The 25-foot buffer effectively counts one vehicle lane and one shoulder parking lane in each direction as converted to “park” space. Finally, we measure the network-based distance from each population-weighted block group centroid to the nearest boundary of each new open space facility created by this policy.

3.3 Model estimation

In random utility choice theory, if an individual living in block group \(n\) wishes to make a park trip, the probability that the individual will choose park \(i\) from the set of all parks \(J\) can be described as a ratio of the park’s measurable utility \(V_{ni}\) to the sum of the utilities for all parks in the set. In the common destination choice framework we apply a multinomial logit model (D. L. McFadden 1974; Recker and Kostyniuk 1978), \[\begin{equation}\label{eq:p} P_{ni} = \frac{\exp(V_{ni})}{\sum_{j \in J}\exp(V_{nj})} \end{equation}\] where the measurable utility \(V_{ni}\) is a linear-in-parameters function of the destination attributes. \[\begin{equation}\label{eq:V} V_{ni} = X_{ni}\beta \end{equation}\] where \(\beta\) is a vector of estimable coefficients giving the relative utility (or disutility) of that attribute to the choice maker, all else equal. It is possible to add amenities of the park or the journey to the utility equation. However, as the number of alternatives is large, it is impractical to consider alternative-specific constants or coefficients and therefore not possible to include attributes of the home block group or traveler \(n\) directly. We can, however, segment the data and estimate distinct distance and size parameters for different segments to observe heterogeneity in the utility parameters between different socioeconomic groups.

The logarithm of the sum in the denominator of Equation (called the logsum) provides a measure of the consumer surplus of the choice set enjoyed by person \(n\) (Williams 1977), \[\begin{equation} CS_n = \ln{{\sum_{j \in J}\exp(V_{nj})}} + C \tag{3.1} \end{equation}\] where \(C\) is a constant indicating an unknown absolute value; the difference of logsum values in two different scenarios eliminates \(C\). Additionally, dividing the difference in logsum from choice set \(J\) and choice set \(J'\) by a cost coefficient \(\beta\) \[\begin{equation} \delta CS_n = (\ln{\sum_{j \in J'}\exp(V_{nj})} - \ln{\sum_{j \in J}\exp(V_{nj})})/\beta \tag{3.2} \end{equation}\] gives an estimate of the benefit received by person \(n\) in monetary terms. Thus, such a “utility-based” accessibility term is continuously defined, contains multiple dimensions of the attributes of the choice, and can be evaluated in monetary terms (Handy and Niemeier 1997; Dong et al. 2006).

set.seed(42)
n_obs <- 20000
n_alts <- 10
n_flow <- sum(park_flows$flow)
ll0e <- sum(n_obs *.8 * log(1/n_alts))
mydata <- park_flows %>%
  ungroup() %>%
  mutate(weight = flow / sum(flow)) %>%
  sample_n(n_obs, replace = TRUE, weight = weight) %>%
  transmute(id = row_number(), home, alt_0 = park, 
            validation = sample(c(TRUE,FALSE), n(), TRUE, prob = c(0.2, 0.8)))

In the most typical cases, researchers estimate the utility coefficients for destination choice models from household travel surveys. For example, the California add-on to the 2017 National Household Travel Survey could be used for this purpose for frequent trips like commutes to work and school. However, as a 24-hour trip diary, it is less useful for recreational trips that may take place less frequently. For better data on park access, we need to synthesize a suitable estimation data set. We do this by sampling 20,000 random discrete device origin-destination pairs from the commercial passive data matrix, weighted by the volume of the flows. This corresponds to a 4.3% sample of all the observed device origin-destination pairs.

The sampled origin-destination pair gives the home location as well as the “chosen” alternative for a synthetic person. In principle the individual’s choice set contains all the parks in our dataset; in practice it can be difficult to estimate choice models with so many alternatives (\(|J| = 500\)). For this reason we randomly sample 10 additional parks to serve as the non-chosen alternatives for our synthetic choice maker. Such random sampling of alternatives reduces the efficiency of the estimated coefficients but the coefficients remain unbiased (Train 2009). As the model has no alternative-specific constants, the standard likelihood comparison statistic against the market shares model \(\rho^2\) is not computable. We instead use the likelihood comparison against the equal shares model \(\rho_0^2\).

sampled_parks <- lapply(1:n_obs, function(i){
  sample(attributed_parks$id, n_alts)
}) %>%
  unlist() %>%
  matrix(ncol = n_alts) %>%
  as_tibble(.name_repair = ~ str_c("alt", 1:n_alts, sep = "_"))

logitdata <- mydata %>%
  bind_cols(sampled_parks) %>%
  gather(key = "alt", value = "park", -id, -home, -validation) %>%
  mutate(chosen = alt == "alt_0") %>%
  arrange(id, alt) %>%
  
  # append distances
  inner_join(distance_df, by = c("home", "park")) %>%
  
  # append park attributes
  left_join(attributed_parks, by = c("park" = "id")) %>%
  
  # append block group attributes
  left_join(acs %>% select(geoid, density:white) %>% st_set_geometry(NULL),
            by = c("home" = "geoid")
   
)

write_rds(logitdata, "data/logitdata.rds")

The resulting analysis dataset therefore contains 20,000 choice makers that select between 11 parks including the park they were observed to choose; the measured distance between the choice maker’s block group and all parks in the choice set; and the acreage of each park in the choice set. We use the mlogit package for R (Croissant 2019; R Core Team 2020) to estimate the multinomial logit models.

3.4 Model application

Using the full set of Alameda County parks — including those added by street conversion — we can apply a destination choice model to calculate the change in park choice utilities and utility-based accessibility values for each block group in Alameda County. As shown with Equation (3.2), the difference in utility-based accessibility values with and without the opened streets is the additional consumer surplus provided by the policy, converted into a monetary value by a cost-utility coefficient. The dataset used for this research does not have any information on travel costs or entrance fees, and such data would likely not be relevant in the context of urban parks. As a result, there is no direct link between the utility and a monetary cost in our estimated models.

As a substitution, we use an estimate of the cost coefficient obtained from the open-source activity-based travel demand model ActivitySim (ActivitySim 2020), which is itself based on the regional travel model employed by the Metropolitan Transportation Commission (MTC), the San Francisco Bay regional MPO. ActivitySim uses a cost coefficient of \(-0.6\) divided by the each simulated agent’s value of time to determine destination choices for non-work trips.1 In ActivitySim, as in most activity-based travel models, the value of time is considered to vary with an individual’s income, but in this aggregate destination choice model, an aggregate value of time will suffice. The average value of time in the synthetic population for the Bay Area is $7.75 per hour, resulting in a cost coefficient on the destination choice utility of \(-0.215\). Dividing the difference in accessibility logsums by the negative of this value gives an initial estimate of the monetary value of the policy to each park user.


  1. To be precise, this is the cost coefficient on the mode choice model for social, recreational, and other trip purposes, which influences destination choice through a logsum-based impedance term.↩︎